feat(learning): teleop + dataset collection + dataprep pipeline#2446
feat(learning): teleop + dataset collection + dataprep pipeline#2446ruthwikdasyam wants to merge 51 commits into
Conversation
Codecov Report❌ Patch coverage is @@ Coverage Diff @@
## main #2446 +/- ##
==========================================
+ Coverage 70.81% 71.03% +0.22%
==========================================
Files 862 874 +12
Lines 77475 78617 +1142
Branches 6882 7011 +129
==========================================
+ Hits 54862 55849 +987
- Misses 20818 20947 +129
- Partials 1795 1821 +26
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 6 files with indirect coverage changes 🚀 New features to boost your workflow:
|
| def _on_buttons(self, msg: Buttons) -> None: | ||
| """Rising-edge detect against `config.button_map`; advance state machine.""" | ||
| ts = time.time() | ||
| for event_name, alias_or_attr in self.config.button_map.items(): | ||
| attr = BUTTON_ALIASES.get(alias_or_attr, alias_or_attr) | ||
| try: | ||
| pressed = bool(getattr(msg, attr)) | ||
| except AttributeError: | ||
| continue | ||
| prev = self._prev_bits.get(attr, False) | ||
| self._prev_bits[attr] = pressed | ||
| if pressed and not prev: # rising edge | ||
| self._transition(event_name, ts) |
There was a problem hiding this comment.
There is a concept of pre-roll and post-roll in data collection. Where we send a stream of 0s before activation (rising edge) in this case and also after deactivation.
This helps mark the exact start and stop point of an episode just from the data.
0s are good for Twist commands, for joint position probably need to stream current joint positions
There was a problem hiding this comment.
on_buttons method is a callback function, to read button presses.
Irrespective of data we are collecting, this marks start and stop checkpoints in the stream of data - which helps us to trim entire session into episodes.
| episodes_saved: int | ||
| episodes_discarded: int | ||
| current_episode_start_ts: float | None | ||
| last_event: Literal["start", "save", "discard", "init"] = "init" |
There was a problem hiding this comment.
A little confused here. Why is the EpisodeStatus counting episodes_saved/discarded ?
A single session would have multiple episodes inside it
There was a problem hiding this comment.
Yes. once we start a blueprint, we collect multiple episodes.
when we keep collecting episodes.. This msg will be live indication of how many we collected, and how many discarded.
…tly dropping the obs
…oist inner imports
Problem
We had no path from a teleop session to a trainable dataset — robot streams were recorded ad hoc and there was no standard, format-agnostic way to segment episodes and export them for policy learning.
Closes DIM-XXX
Solution
Adds an end-to-end data-collection → dataset-prep pipeline:
dimos/learning/collection/):EpisodeMonitorModuleturns Quest buttons/keyboard intostart/save/discardepisode events;CollectionRecordercaptures the obs/action/status streams toa SQLite session DB. Two ready blueprints:
learning-collect-quest-xarm7andlearning-collect-quest-piper.dimos/learning/dataprep/): reads the session DB, extracts episodes (episode_statusor explicitranges), time-syncs obs/action streams onto a common timeline, and writes LeRobot v2 orHDF5 datasets with streaming per-feature stats + a
dimos_meta.jsonsidecar. Pure core / impure build split for testability.dimos dataprep buildanddimos dataprep inspect.How to Test
Contributor License Agreement